This analysis explores relationships between indicators across
countries. To get first impressions of all the indicators under
observation, we refer to the ‘indicators.R’ file in the ‘Scripts’
directory.
For this analysis specifically, we constrain
ourselves to indicators such as countries’ percentage of agricultural
land, CO2 emissions per capita in megatonnes, the size of
surface area in square kilometers, and total population using World Bank
data. The exploration is divided into two research questions, namely:
1. Is there a relationship between the percentage of
agricultural land and CO2 emissions per
capita across countries?
2. Does the size of
the surface area of the country play a role?
| Variable | Indicator Name | Definition |
|---|---|---|
| AG.LND.AGRI.ZS | Agricultural land (% of land area) | Agricultural land refers to the share of land area that is arable, under permanent crops, and under permanent pastures. Arable land includes land defined by the FAO as land under temporary crops (double-cropped areas are counted once), temporary meadows for mowing or for pasture, land under market or kitchen gardens, and land temporarily fallow. Land abandoned as a result of shifting cultivation is excluded. Land under permanent crops is land cultivated with crops that occupy the land for long periods and need not be replanted after each harvest, such as cocoa, coffee, and rubber. This category includes land under flowering shrubs, fruit trees, nut trees, and vines, but excludes land under trees grown for wood or timber. Permanent pasture is land used for five or more years for forage, including natural and cultivated crops. |
| AG.SRF.TOTL.K2 | Surface area (sq. km) | Surface area is a country’s total area, including areas under inland bodies of water and some coastal waterways. |
| EN.GHG.CO2.MT.CE.AR5 | Carbon dioxide (CO2) emissions (total) excluding LULUCF (Mt CO2e) | A measure of annual emissions of carbon dioxide (CO2), one of the six Kyoto greenhouse gases (GHG), from the building sector (subsector of the energy sector) including IPCC 2006 codes 1.A.4 Residential and other sectors, 1.A.5 Non-Specified. The measure is standardized to carbon dioxide equivalent values using the Global Warming Potential (GWP) factors of IPCC’s 5th Assessment Report (AR5). |
| SP.POP.TOTL | Population, total | Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates. |
The dataset we have been provided with contains longitudinal observations of 25 countries with 18 indicators each. The yearly data acquisition happened from 2000 to 2021.
1.) Percentage of agricultural land and CO2 emissions per capita
1.1.) Heat map of CO2 emissions
1.2.) Boxplot of CO2
emissions
1.3.) Point-line plot of agricultural land with
faceted countries
1.4.) Boxplot of agricultural land
1.5.) Scatter plot of interested variables
1.6.) Point-line
plot of CO2 emissions with faceted countries and color scale
1.7.) Point-line plot of interested variables with faceted countries;
normalized
2.) Role of surface area in previous relationship
2.1.) Point-line plot of surface area with faceted countries
2.2.) Bar plot of absolute changes with faceted countries;
changing countries
2.3.) Point-line plot of relative changes
with faceted countries; changing countries
2.4.) Boxplot of
surface area
2.5.) Scatter plot of interested variables with
color scale and with faceted grouping
2.6.) Point-line plot of
CO2 emissions with faceted grouping and color scale
2.7.)
Point-line plot of interested variables with faceted grouping;
normalized
We analyze how the percentage of agricultural land relates to the
CO2 emissions per capita. To get an overview over the
interested data and be able to evaluate future insights correctly, we
start by looking at the two indicators separately.
Starting with the distribution of the CO2 emissions in megatonnes for each country over the observed time frame, we get the following information.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.246 8.782 50.328 672.243 232.015 12717.655
It appears that occasionally there are immense differences within countries’ CO2 emissions from one year to another, displayed by clear jumps in the sequential color scale. After further investigation of this phenomenon inside the database and consultation with our supervisors, we came to the conclusion, that the false values originate from database-caused mishandle during the data set’s download. This mishandlement explains the arbitrary distribution of single entries being valued less by the factor ten - in some cases even by the factor 100. Moving forward, we accept these anomalies and handle them as the error-produced outliers as they are, keeping it in mind and taking future insights with a grain of salt.
## Streuung zwischen den Ländern: 2787843
The CO2 emissions have high variance within the countries. Simultaneously, there are enormous differences in absolute amounts between the countries. Therefore, the greatest challenge may lie in comparing the different countries’ values and trends although the data is provided on a per capita basis. For future comparisons of the two interested variables, we might switch to a logarithmic display of the CO2 emissions, to better visualize the span of countries’ deviations.
Furthermore, the distribution of the percentage of agricultural land delivers the following information.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.564 29.805 43.411 42.040 56.129 80.439
## Streuung zwischen den Ländern: 437.7944
In contrast to the CO2 emissions, the percentages of
agricultural land have rather low variance within the countries.
However, there are recognizable deviations between the countries,
spanning from only five to up to eighty percent. As we operate on a
capped percentage scale though, comparisons should be possible quite
well.
Moving on, we want to bring those two variables back
together. For this purpose, analyzing the distribution of the collected
data while disregarding the country-specific origin gives us the
following cloud of data points. Note, that the CO2 emissions
are now displayed logarithmic to counter the expansive value disparity
in the data.
We recognize a slightly positive linear relationship between the
variables, meaning countries with higher percentage of agricultural land
account, on average, for more megatonnes of CO2 emissions per
capita, while countries with less percentage agricultural land account
for fewer CO2 emissions. However, the development over time
and the country-specification of observations are completely ignored. In
order to take those factors back into consideration, we first
distinguish among the countries by faceting our visualization for an
in-depth comparison of the indicators for each country over time.
The chosen format of the scatter plot instead of line visualization
attempts to counter the false entries for CO2 emissions, as
for lines the error-caused outliers are displayed in a more extreme way
leading us to scatters instead (as well as the unknown development
between the data entries, which hinders us of doing linear assumptions).
However, the in ascending average percentage of agricultural
land sorted facets show no obvious connection between the two
indicators, as the CO2 emissions are developing quite
arbitrarily regardless of the associated percentage of agricultural
land.
To dig even further, we now adjust the data by normalizing the CO2 emissions as well as the percentage of agricultural land within each country, letting us investigate relative changes on the same scale for both indicators and comparing without constraints caused by different value dimensions.
We recognize differing developments between the two indicators for
many of the countries. Aruba shows no agricultural land data points, as
there is no change in its percentage over the years leading to the
not-computable (min; max)-normalization. For the other countries there
is no obvious pattern, which the developments of the two indicators seem
to follow, leading us to the next step, which is the introduction of the
countries’ surface area.
One further aspect that might change the recorded relationship now brings the introduction of another variable to take into account, namely the countries’ surface area. The definition of the surface area in the dictionary at the beginning shows the importance of the clear distinguishment between a countries surface area and land area. While the percentage of agricultural land describes the percentage of agriculturally used area as percentage of the land area, the surface area refers to the countries complete area, even including areas under inland bodies of water and some coastal waterways. Therefore, a direct comparison of the two indicators is not possible without keeping the difference in mind.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 180 243610 796100 2461119 1285220 17098250
## Anzahl der Länder ohne Veränderungen: 10
There are several countries with no changes in surface area throughout the interested time span at all. Therefore, before heading forward, we first want to zoom in a little closer on those with changes to understand the relevance of those changes.
For the vast majority of the countries, the changes can be classified as under 1,000 square kilometers over the whole time span, which means they are negligible in comparison to the non-changing countries within our data analysis. Similar insights can be derived when looking at the relative changes.
For each country, even those with changes throughout the time span, there are at most marginal changes of two percent in surface area. These, as stated before, are negligible for our analysis, allowing us to assume, that countries do not switch their order with respect to the surface area amount during the time frame. This claim can finally be confirmed by looking at the scatter decomposition.
## Streuung zwischen den Ländern: 1.783129e+13
Moreover, we drop the focus on the development over time considering
this variable when moving on, shifting the perspective towards whether
the average absolute amount of surface area plays any role in the
relationship between agricultural land and CO2 emissions for
the observed countries.
Interestingly, this allows us to
classify the countries into quantiles based on their average values for
specific indicators as done during the analysis for all provided
indicators, this time performed using the surface area. We group the
countries into the following segments:
| Quantile | Q1 | Q2 | Q3 | Q4 | Q5 |
|---|---|---|---|---|---|
| Surface area | Very Low | Low | Medium | High | Very High |
Applying the newly established grouping to the initial comparison of the percentage of agricultural land and the megatonnes of CO2 emissions per capita done before, shows the follwoing relationships.
At first glance, the relationship seems to transfer into each of the
five groups with weak to medium positive linear relationships, the
countries with moderate surface area being the only ones with almost no
measurable relationship. For all of the others the connection of
countries with higher percentage of agricultural land account, on
average, for more megatonnes of CO2 emissions per capita,
while countries with less percentage agricultural land account for fewer
CO2 emissions seems to apply again.
Further, the
distribution of the data points within the facets catches the eye, as
the very small and large countries have comparably fewer CO2
emissions, while the very large countries show the - presumably expected
- by far highest CO2 emissions among the data set. The role
of surface area seems to be quite small, as the relationship stays
basically unchanged regardless of the countries’ classification.
The biggest anomalies regarding the CO2 emissions with the
percentage of agricultural land in mind seem to be the moderate and very
large surface area countries. On one hand, we can detect comparably high
percentages in agricultural land for the moderate area countries, but
those do not transfer themselves to any obvious differences in the
CO2 emissions compared to the other groups, on the contrary,
the CO2 emissions are even lower on average than those of the
very small and small countries. On the other hand, the very large
countries stand out by having the presumably expected highest
CO2 emissions among all groups. Marginal differences appear
between the intensity of positive development over time, with all groups
having slightly increasing trends in CO2 emissions per
capita.
To round up this exploration, we want dig deeper by
looking at the time-specific distribution within the groups with another
twist.
If we finally pivot back to our normalized comparison we did earlier, we can do the same now with our grouped data according to the surface area categories.
We cannot identify any obvious connection between the CO2 emissions per capita and the percentage of agricultural land even with the interested countries categorized by surface area.
Here we notice something fascinating: while the four smallest groups
show a partly parallel, partly at least comparable development over the
years, the very large countries stand out. Although the percentage of
agricultural land has fallen drastically over the years, CO2
emissions have risen regardlessly. One possible explanation could be
that for the very large countries in our dataset, the decline in
agricultural land may have been accompanied by an increase in
urbanization, leading to even greater CO2 emissions than
those caused by agriculture.
Whether this really is the case
can only be speculated at this point, but with further information on
aspects such as urban, forestry or water area, further analyses on this
issue are possible and advised.